Contracts: HPC-Specific CUI Compliance Roles
Branch: 004-hpc-cui-roles | Date: 2026-02-15
Overview
This specification does not expose external REST/GraphQL APIs. Instead, it defines internal contracts between HPC components:
- Prolog/Epilog Script Contracts - Exit codes and output format
- Container Wrapper Contract - Execution logging format
- ACL Sync Daemon Contract - FreeIPA event handling
- Onboarding Playbook Variables - Input/output specification
- Health Check Script Contract - Status reporting format
1. Prolog Script Contract
Location: /etc/slurm/prolog.d/cui_prolog.sh
Executor: Slurm slurmctld (as root on compute node)
Input (Environment Variables)
# Provided by Slurm
SLURM_JOB_ID # Job ID (integer)
SLURM_JOB_USER # Submitting username
SLURM_JOB_ACCOUNT # Slurm account (maps to CUI project)
SLURM_JOB_PARTITION # Partition name
SLURM_JOB_NODELIST # Allocated nodes
SLURM_JOB_UID # User's numeric UID
SLURM_JOB_GID # User's numeric GID
Output
Exit Codes:
| Code | Meaning | Slurm Behavior |
|---|---|---|
| 0 | Authorization passed | Job proceeds |
| 1 | Authorization failed | Job killed, user notified |
| 2 | Transient error | Job requeued (Slurm handles retry) |
Stdout (logged to syslog):
CUI_PROLOG: job=12345 user=jdoe account=cui_nasa2026 auth=PASS training=VALID
Stderr (on failure):
CUI_PROLOG_ERROR: Authorization failed - CUI training expired on 2026-01-15
Syslog Format
facility: local0
priority: info (success), err (failure)
tag: cui_prolog
message: JSON object
{
"event": "job_start",
"job_id": 12345,
"user": "jdoe",
"account": "cui_nasa2026",
"partition": "cui",
"nodes": ["node001", "node002"],
"authorization": "pass",
"training_verified": true,
"timestamp": "2026-02-15T10:30:00Z"
}
2. Epilog Script Contract
Location: /etc/slurm/epilog.d/cui_epilog.sh
Executor: Slurm slurmctld (as root on compute node)
Input (Environment Variables)
Same as prolog, plus:
SLURM_JOB_EXIT_CODE # Job exit code
Output
Exit Codes:
| Code | Meaning | Slurm Behavior |
|---|---|---|
| 0 | Sanitization complete | Node returns to pool |
| 1 | Sanitization failed | Node drained with reason |
Stdout (logged to syslog):
CUI_EPILOG: job=12345 shm=CLEARED tmp=CLEARED gpu=RESET health=PASS
Syslog Format
{
"event": "job_end",
"job_id": 12345,
"user": "jdoe",
"exit_code": 0,
"sanitization": {
"shm_cleared": true,
"tmp_cleared": true,
"gpu_reset": true,
"gpu_count": 4
},
"health_check": "pass",
"duration_seconds": 45,
"timestamp": "2026-02-15T12:45:00Z"
}
3. Container Wrapper Contract
Location: /usr/local/bin/apptainer-cui
Executor: User (via Slurm job)
Input (Command Line)
apptainer-cui run [options] <image.sif> [command]
apptainer-cui exec [options] <image.sif> <command>
apptainer-cui shell [options] <image.sif>
Enforced Options
The wrapper automatically adds:
--net --network=none # Network isolation (always)
--bind /dev/infiniband # InfiniBand passthrough (if MPI job)
--no-home # Don't mount user home
--bind /cui/projects/$PROJECT:/data # CUI data path only
Output
Exit Codes:
| Code | Meaning |
|---|---|
| 0 | Success |
| 1 | Container execution failed |
| 126 | Signature verification failed |
| 127 | Image not found |
Stderr (on signature failure):
ERROR: Container signature verification failed.
Image: /path/to/container.sif
Hash: sha256:abc123...
To use containers in the CUI enclave, images must be signed.
See: https://docs.example.edu/cui/container-signing
Audit Log Format
Location: /var/log/apptainer-cui/executions.json (JSON lines)
{
"execution_id": "550e8400-e29b-41d4-a716-446655440000",
"timestamp": "2026-02-15T10:30:00Z",
"username": "jdoe",
"slurm_job_id": 12345,
"hostname": "node001",
"image_path": "/cui/containers/gromacs-2024.sif",
"image_hash": "sha256:abc123...",
"command": "run",
"args": ["mdrun", "-s", "topol.tpr"],
"bind_mounts": ["/cui/projects/cui-nasa2026:/data"],
"network_mode": "none",
"infiniband_enabled": true,
"exit_code": 0,
"duration_seconds": 3600
}
4. ACL Sync Daemon Contract
Location: /usr/local/lib/cui-acl-sync/acl_sync.py
Executor: systemd service (as root)
Configuration
Location: /etc/cui-acl-sync/config.yml
freeipa:
server: ipa.example.edu
base_dn: dc=example,dc=edu
group_prefix: cuiproj-
filesystem:
type: lustre # or beegfs
project_root: /cui/projects
sync:
batch_window_seconds: 30
max_retries: 3
logging:
level: INFO
file: /var/log/cui-acl-sync/sync.log
IPC (systemd socket)
Socket: /run/cui-acl-sync/control.sock
Commands:
| Command | Response |
|---|---|
STATUS |
JSON status object |
SYNC <project_id> |
Force immediate sync |
RELOAD |
Reload configuration |
Status Response
{
"daemon": "running",
"uptime_seconds": 86400,
"last_sync": "2026-02-15T10:30:00Z",
"syncs_performed": 1234,
"errors_last_hour": 0,
"watching_groups": 15,
"pending_changes": 0
}
Event Log Format
{
"event": "acl_sync",
"timestamp": "2026-02-15T10:30:00Z",
"project_id": "cui-nasa2026",
"group": "cuiproj-cui-nasa2026",
"action": "update",
"members_added": ["newuser"],
"members_removed": [],
"directories_updated": 1,
"duration_ms": 250
}
5. Onboarding Playbook Contract
Playbook: playbooks/onboard_project.yml
Input: Extra variables
Required Variables
project_id: "cui-nasa2026" # Unique project identifier
project_name: "NASA Mars Data 2026" # Human-readable name
pi_username: "jdoe" # PI's FreeIPA username
pi_email: "jdoe@example.edu" # PI contact email
storage_quota_gb: 5000 # Storage allocation
team_members: # Initial team (optional)
- username: "rsmith"
role: "researcher"
- username: "mjones"
role: "researcher"
Output Facts
After successful run, these Ansible facts are set:
cui_project_created:
project_id: "cui-nasa2026"
freeipa_group: "cuiproj-cui-nasa2026"
slurm_account: "cui_nasa2026"
storage_path: "/cui/projects/cui-nasa2026"
welcome_packet: "/tmp/cui-nasa2026-welcome.md"
Generated Artifacts
| Artifact | Location |
|---|---|
| FreeIPA group | cuiproj-{project_id} |
| Slurm account | cui_{project_id} (underscores) |
| Storage directory | /cui/projects/{project_id} |
| Welcome packet | /tmp/{project_id}-welcome.md |
| Evidence record | data/onboarding/{project_id}.json |
6. Offboarding Playbook Contract
Playbook: playbooks/offboard_project.yml
Input: Extra variables
Required Variables
project_id: "cui-nasa2026" # Project to offboard
data_disposition: "sanitize" # sanitize | archive
archive_destination: "" # Required if archive
force: false # Skip grace period
State Machine
initiated β blocking_submissions β grace_period β revoking_access β sanitizing β completed
Output Facts
cui_project_offboarded:
project_id: "cui-nasa2026"
status: "completed"
jobs_at_initiation: 3
jobs_completed_in_grace: 3
grace_period_ended: "2026-02-16T10:00:00Z"
sanitization_method: "clear"
sanitization_verified: true
evidence_file: "data/offboarding/cui-nasa2026.json"
7. Health Check Script Contract
Location: /etc/slurm/healthcheck.d/cui_health.sh
Executor: Slurm slurmctld (as slurm user)
Input
None (reads local system state)
Output
Exit Codes:
| Code | Meaning | Slurm Action |
|---|---|---|
| 0 | Node healthy | Available for jobs |
| 1 | Node unhealthy | Drain with reason |
Stdout (on failure):
UNHEALTHY: SSSD not responding
Health Checks Performed
| Check | Pass Condition |
|---|---|
| SSSD | systemctl is-active sssd |
| Auditd | systemctl is-active auditd |
| Mounts | All required mounts present |
| GPU | nvidia-smi returns 0 (if GPUs present) |
| Memory | /dev/shm cleared flag present |
| Quarantine | No /var/run/cui-quarantine file |
8. Sanitization Evidence Contract
Producer: Epilog script, offboarding playbook Consumer: Spec 003 evidence collection
File Format
Location: data/sanitization/{hostname|project_id}_{timestamp}.json
{
"type": "node_sanitization",
"hostname": "node001",
"timestamp": "2026-02-15T12:45:00Z",
"method": "clear",
"scope": "job_memory",
"job_id": 12345,
"actions": [
{
"target": "/dev/shm",
"method": "overwrite_zeros",
"size_bytes": 134217728,
"duration_ms": 500,
"verified": true
},
{
"target": "/tmp",
"method": "secure_delete",
"files_removed": 42,
"duration_ms": 1200,
"verified": true
},
{
"target": "gpu_memory",
"method": "nvidia_reset",
"gpu_count": 4,
"duration_ms": 8000,
"verified": true
}
],
"overall_status": "success",
"performed_by": "cui_epilog",
"verification": {
"method": "pattern_test",
"result": "pass"
}
}
Contract Versioning
All contracts follow semantic versioning.
| Contract | Current Version | Last Updated |
|---|---|---|
| Prolog Script | 1.0.0 | 2026-02-15 |
| Epilog Script | 1.0.0 | 2026-02-15 |
| Container Wrapper | 1.0.0 | 2026-02-15 |
| ACL Sync Daemon | 1.0.0 | 2026-02-15 |
| Onboarding Playbook | 1.0.0 | 2026-02-15 |
| Offboarding Playbook | 1.0.0 | 2026-02-15 |
| Health Check | 1.0.0 | 2026-02-15 |
| Sanitization Evidence | 1.0.0 | 2026-02-15 |